2022/03/18

Today’s Schedule

  • Finish off R stuff!
  • Discussion about Mediation and Moderation!
    • Baron and Kenny!
    • Rucker, Preacher, Tormala, and Petty!
    • Rohrer!
    • Spencer, Zanna, and Fong!
  • Mediation/Moderation in R!

‘Weird Data’: Dates

  • Does every year have 365 days?
  • Does every day have 24 hours?
  • Does every minute have 60 seconds?

Date/Time Data Forms

  • A date (<date>)
  • A time within a day (<time>)
  • A date-time is a date plus a time (<dttm>).

For instance:

library(lubridate) # not loaded with tidyverse
today() # a <date>
## [1] "2022-03-17"
now() # a <dttm>
## [1] "2022-03-17 20:34:19 PDT"

lubridate

Many functions for converting vectors:

ymd("2017-01-31")
## [1] "2017-01-31"
mdy("January 31st, 2017")
## [1] "2017-01-31"
dmy("31-Jan-2017")
## [1] "2017-01-31"

Also of interest: as_date(), make_date(), as.duration()

Modeling Basics: The Formula Interface

  • Used by R to generate “design matrices” for modeling functions
  • Denoted in code as a text string with a tilde (~)
  • Easiest to picture with a linear regression model, such as:
my_mod <- lm(y ~ x, data = dat)

The formula interface defines the columns to be used in the design matrix (as well as the rows, if a subset argument is also passed to lm()!).

This interface is used by a wide variety of packages and purposes!

Key Symbols for Creating Formulas

  • The tilde (~): In a two-sided formula, separates the response variables (left-side) from the predictor variables (right-side); can also be used to invoke a one-sided formula.
  • The plus sign (+): Used to add main effects for additional variables (e.g., y ~ x1 + x2).
  • The minus sign (-): Used to remove terms from a model.
  • The colon (:): Used to incorporate a particular interaction.
  • The asterisk (*): Used to incorporate a “crossing” (the highest order interaction between the terms and all lower-order forms).
  • The %in%: Sometimes used to declare nesting between variables.

Some Examples:

# Cross-Tables (rows by columns)
g1 ~ g2 

# Independent Samples t-Test
y ~ group

# Equivalent Interactions:
y ~ x1 * x2
y ~ x1 + x2 + x1:x2

# Polynomial Regression:
y ~ x + I(x^2) + I(x^3)

# Factorial ANOVA:
y ~ (a*b*c)^2

The Readings:

  • Baron, R.M., & Kenny, D.A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.
  • Rucker, D. D., Preacher, K. J., Tormala, Z. L., & Petty, R. E. (2011). Mediation analysis in social psychology: Current practices and new recommendations. Social and Personality Psychology Compass, 5(6), 359-371.
  • Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27-42.
  • Spencer, S.J., Zanna, M.P., & Fong, G.T. (2005). Establishing a Causal Chain: Why Experiments Are Often More Effective Than Mediational Analyses in Examining Psychological Processes. Journal of Personality and Social Psychology, 89(6), 845–851.

Supplemental

R packages for mediation analyses:

  • BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 20015)
  • bmem (Zhang & Wang, 2011)
  • mediation (Tingley, Yamamoto, Hirose, Keele, & Imai, 2014)
  • powerMediation (Qui, 2015)
  • RMediation (Tofighi & MacKinnon, 2010)

Functions within other packages:

  • mediate() in psych package (Revelle, 2012)
  • mediation() in MBESS package (Kelley & Lai, 2012)

Example 1: A Moderation Model

The following example data (epi.bfi) comes from the psych package (Revelle, 2012). Does the relation between neuroticism (X) and depression (Y) depend on state-anxiety (Z)?

The Data

  • N = 231 undergraduate students from a Midwest school
  • bdi = Beck Depression Inventory
  • epiNeur = Neuroticism from Eysenck Personality Inventory
  • stateanx = state anxiety

The Setup

# install.packages("psychTools")
library(psychTools)
data(epi.bfi)
head(epi.bfi, n = 4)
##   epiE epiS epiImp epilie epiNeur bfagree bfcon bfext bfneur bfopen bdi
## 1   18   10      7      3       9     138    96   141     51    138   1
## 2   16    8      5      1      12     101    99   107    116    132   7
## 3    6    1      3      2       5     143   118    38     68     90   4
## 4   12    6      4      3      15     104   106    64    114    101   8
##   traitanx stateanx
## 1       24       22
## 2       41       40
## 3       37       44
## 4       54       40

Fitting the model:

# Using lm() for the linear model formulation
example1 <- lm(bdi ~ stateanx*epiNeur, data=epi.bfi)

The Results:

summary(example1)
## 
## Call:
## lm(formula = bdi ~ stateanx * epiNeur, data = epi.bfi)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.0493  -2.2513  -0.4707   2.1135  11.9949 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       0.06367    2.18559   0.029   0.9768   
## stateanx          0.03750    0.06062   0.619   0.5368   
## epiNeur          -0.14765    0.18869  -0.782   0.4347   
## stateanx:epiNeur  0.01528    0.00466   3.279   0.0012 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.12 on 227 degrees of freedom
## Multiple R-squared:  0.4978, Adjusted R-squared:  0.4912 
## F-statistic: 75.02 on 3 and 227 DF,  p-value: < 2.2e-16

WARNING

Note! We have a significant interaction. We may want to probe this interaction by creating a simple slopes plot.

The rockchalk package (Johnson, 2015) offers a variety of helpful functions to generate simple slopes plots, test simple slopes, and generate regression tables.

# install.packages("rockchalk")
library(rockchalk)

Simple Slopes:

The plotSlopes() function produces a simple slopes plot.

We have to specify our model, followed by the variable to be plotted on the x-axis, followed by the other variable in the interaction (our “moderator”), followed by the values at which we want to set the moderator.

R tip: Remember that predictor names need to be included in quotation marks!

Simple Slopes:

plotSlopes(example1, plotx="epiNeur", modx="stateanx", 
           modxVals="std.dev.", main = "Simple Slopes Plot Example")

Simple Slopes, Customized:

Instead of using standard deviations (through the “std.dev.” option in modxVals), we can specify “quantile” or our own values. Since the variable state anxiety ranges from 21-79, perhaps we are interested in the slopes at 30, 40, 50, 60, and 70:

plotSlopes(example1, plotx="epiNeur", modx = "stateanx", 
           modxVals = c(30,40,50,60,70), main = "Simple Slopes Plot Example")

Confidence Intervals:

If you want to add 95% CIs around the slopes, use plotCurves():

plotCurves(example1, plotx="epiNeur", modx = "stateanx", modxVals ="std.dev.",
           interval = "confidence", main = "Simple Slopes Plot Example")

plotCurves(example1, plotx="epiNeur", modx="stateanx", modxVals = c(30,40,50,60,70),
           interval="confidence", main = "Simple Slopes Plot Example")

Example 2: Simple Mediation Model

Data are from Pollack, VanEpps, & Hayes (2012; also found in Hayes, 2013).

Does economic stress (X) lead to a desire to withdraw from small business (Y), as a result of negative affect (M)?

VARIABLES:

  • X = estress (1-7 Likert scale)
  • M = affect (1-5 Likert scale)
  • Y = withdraw (1-7 Likert scale)
estressData <- read.table("estress.csv", sep = ",", header=TRUE)
# install.packages("MBESS")
library(MBESS)

The Mediation Model:

results2 <- mediation(x = estressData$estress, 
                      mediator = estressData$affect, 
                      dv = estressData$withdraw, 
                      bootstrap = TRUE, B = 10000, 
                      which.boot = "BCa")
## [1] "Bootstrap resampling has begun. This process may take a considerable amount of time if the number of replications is large, which is optimal for the bootstrap procedure."
# Note: This conducts 10,000 replications to calculate bias-corrected and 
# accelerated bootstrap confidence intervals

The Results:

This results2 object contains:

  • M on X regression table and model fit
  • Y on X and M regression table and model fit
  • Bootstrap results
  • Measures of indirect effect
  • Residual based indices
  • Example 95% confidence interval for indirect effect

The Results:

round(results2,3)
##                                        Estimate CI.Lower_BCa CI.Upper_BCa
## Indirect.Effect                           0.133        0.078        0.213
## Indirect.Effect.Partially.Standardized    0.107        0.059        0.165
## Index.of.Mediation                        0.152        0.085        0.235
## R2_4.5                                   -0.003       -0.035        0.037
## R2_4.6                                    0.020        0.007        0.046
## R2_4.7                                    0.113        0.018        0.193
## Ratio.of.Indirect.to.Total.Effect         2.369       -1.000      305.891
## Ratio.of.Indirect.to.Direct.Effect       -1.730      -46.761        3.414
## Success.of.Surrogate.Endpoint             0.325       -0.816        0.853
## Residual.Based_Gamma                      0.024       -0.001        0.055
## Residual.Based.Standardized_gamma         0.037        0.011        0.071
## SOS                                      -0.658    -2037.209        0.999

The Indirect Effect:

We could ask for first row of the Bootstrap.Results to see the estimate of the indirect effect:

round(results2[1,], 3)
##     Estimate CI.Lower_BCa CI.Upper_BCa 
##        0.133        0.078        0.213

Example 3: Multiple Mediators Model

Suppose you have 3 predictors (X1,X2, & X3) for 2 mediators (M1 & M2) that predict one outcome (Y).Data from Mplus version 7 (Muthen & Muthen, 2012) user’s guide.

The Data:

ex3 <- read.table("ex2data.txt", header = TRUE)

This model requires a latent variable modeling package to run. Typical options are lavaan, sem, or open-MX.

#install.packages("lavaan")
library(lavaan)

THE MODEL…

example3 <- '  ## define regressions
          m1 ~ a1*x1 + x2 + x3
          m2 ~ a2*x1 + x2 + x3
          y1 ~ b1*m1 + b2*m2 + x2
          ## define indirect effects
          ind1 := a1 * b1
          ind2 := a2 * b2
          totalind := ind1 + ind2
          ## correlated residual variances
          m1 ~~ m2 '

The Call…

# Use sem() to analyze the model
results3 <- sem(example3, data = ex3, meanstructure = TRUE, se = "boot", bootstrap = 1000)

# The model will take a few minutes to run because we have requested 1000 replications.

The Results…

# Use summary() to view results
summary(results3, standardized=TRUE)
## lavaan 0.6-9 ended normally after 39 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        16
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               706.322
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   m1 ~                                                                  
##     x1        (a1)    2.935    0.050   58.869    0.000    2.935    0.759
##     x2                1.992    0.050   39.522    0.000    1.992    0.497
##     x3                1.023    0.049   20.892    0.000    1.023    0.256
##   m2 ~                                                                  
##     x1        (a2)    2.672    0.067   40.100    0.000    2.672    0.507
##     x2                3.544    0.073   48.361    0.000    3.544    0.649
##     x3                2.318    0.075   31.099    0.000    2.318    0.426
##   y1 ~                                                                  
##     m1        (b1)   -0.613    0.053  -11.637    0.000   -0.613   -0.607
##     m2        (b2)    1.116    0.048   23.458    0.000    1.116    1.506
##     x2               -0.642    0.123   -5.206    0.000   -0.642   -0.159
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .m1 ~~                                                                 
##    .m2                1.104    0.089   12.369    0.000    1.104    0.552
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .m1               -0.042    0.052   -0.799    0.424   -0.042   -0.010
##    .m2                0.499    0.078    6.425    0.000    0.499    0.089
##    .y1               -1.665    0.091  -18.275    0.000   -1.665   -0.399
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .m1                1.408    0.082   17.240    0.000    1.408    0.082
##    .m2                2.842    0.173   16.432    0.000    2.842    0.089
##    .y1                3.781    0.241   15.662    0.000    3.781    0.217
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ind1             -1.798    0.161  -11.154    0.000   -1.798   -0.460
##     ind2              2.981    0.149   20.051    0.000    2.981    0.763
##     totalind          1.183    0.065   18.152    0.000    1.183    0.303

parameterEstimates(results3, output = "text")
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   m1 ~                                                                  
##     x1        (a1)    2.935    0.050   58.869    0.000    2.839    3.032
##     x2                1.992    0.050   39.522    0.000    1.895    2.091
##     x3                1.023    0.049   20.892    0.000    0.929    1.121
##   m2 ~                                                                  
##     x1        (a2)    2.672    0.067   40.100    0.000    2.548    2.809
##     x2                3.544    0.073   48.361    0.000    3.393    3.683
##     x3                2.318    0.075   31.099    0.000    2.179    2.470
##   y1 ~                                                                  
##     m1        (b1)   -0.613    0.053  -11.637    0.000   -0.719   -0.514
##     m2        (b2)    1.116    0.048   23.458    0.000    1.021    1.207
##     x2               -0.642    0.123   -5.206    0.000   -0.892   -0.390
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##  .m1 ~~                                                                 
##    .m2                1.104    0.089   12.369    0.000    0.914    1.274
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .m1               -0.042    0.052   -0.799    0.424   -0.145    0.067
##    .m2                0.499    0.078    6.425    0.000    0.346    0.653
##    .y1               -1.665    0.091  -18.275    0.000   -1.848   -1.490
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .m1                1.408    0.082   17.240    0.000    1.242    1.566
##    .m2                2.842    0.173   16.432    0.000    2.491    3.159
##    .y1                3.781    0.241   15.662    0.000    3.318    4.239
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##     ind1             -1.798    0.161  -11.154    0.000   -2.116   -1.492
##     ind2              2.981    0.149   20.051    0.000    2.705    3.276
##     totalind          1.183    0.065   18.152    0.000    1.060    1.322

The Results…

fitMeasures(results3)
##                npar                fmin               chisq                  df 
##              16.000               0.706             706.322               2.000 
##              pvalue      baseline.chisq         baseline.df     baseline.pvalue 
##               0.000            4107.449              12.000               0.000 
##                 cfi                 tli                nnfi                 rfi 
##               0.828              -0.032              -0.032               1.000 
##                 nfi                pnfi                 ifi                 rni 
##               0.828               0.138               0.828               0.828 
##                logl   unrestricted.logl                 aic                 bic 
##           -2716.785           -2363.623            5465.569            5533.003 
##              ntotal                bic2               rmsea      rmsea.ci.lower 
##             500.000            5482.218               0.839               0.788 
##      rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
##               0.892               0.000               0.233               0.264 
##                srmr        srmr_bentler srmr_bentler_nomean                crmr 
##               0.054               0.054               0.061               0.061 
##         crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
##               0.072               0.054               0.061               5.241 
##               cn_01                 gfi                agfi                pgfi 
##               7.520               0.992               0.890               0.073 
##                 mfi                ecvi 
##               0.494               1.477

The Results…

fitted(results3)
## $cov
##    m1     m2     y1     x1     x2     x3    
## m1 17.103                                   
## m2 20.989 31.757                            
## y1 11.437 19.935 17.433                     
## x1  3.373  3.059  1.321  1.143              
## x2  2.335  4.104  2.463  0.039  1.066       
## x3  1.119  2.673  2.236 -0.058  0.096  1.074
## 
## $mean
##     m1     m2     y1     x1     x2     x3 
##  0.027  0.499 -1.108  0.046 -0.027 -0.012

The Results…

resid(results3)
## $type
## [1] "raw"
## 
## $cov
##    m1     m2     y1     x1     x2     x3    
## m1  0.000                                   
## m2  0.000  0.000                            
## y1  0.000  0.000  0.000                     
## x1  0.000  0.000 -0.285  0.000              
## x2  0.000  0.000  0.000  0.000  0.000       
## x3  0.000  0.000  1.177  0.000  0.000  0.000
## 
## $mean
## m1 m2 y1 x1 x2 x3 
##  0  0  0  0  0  0

Next Time on…

Readings on “Latent Variable Modelling and Meta-Analysis”:

  • Bollen, K. (2002). Latent variables in psychology and the social sciences. Annual Review of Psychology, 53(1), 605-634.
  • Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey questionnaires. Organizational research methods, 1(1), 104-121.
  • Zanon, C., Hutz, C.S., Yoo, H., & Hambleton, R. (2016). An application of item response theory to psychological test development. Psicol. Refl. Crít., 29(18).
  • Baldwin, S.A., & Shadish, W.R. (2011). A Primer on Meta-Analysis in Clinical Psychology. Journal of Experimental Psychopathology, 294–317.
  • Hohn, R., Slaney, K., & Tafreshi, D. (2019). Primary Study Quality in Psychological Meta-Analyses: An Empirical Assessment of Recent Practice. Frontiers in Psychology, 9, 1-15.